Visualizing Vancouver Street Tree Data¶

Mariwan Ibrahim¶


Introduction¶

Motivation¶

Vancouver is one of the most beautiful cities in Canada, known for its diverse variety of tree species that have been planted throughout the city. I have seen some great landscaping and tree-planting projects in the city. One that really fascinates me is the creative planting of trees on the walls of buildings in downtown near Canada Place. However, downtown has limited space for trees due to the many buildings. In the area I live Renfrew-Collingwood I see more trees than downtown, maybe other areas in Vancouver. For sure, the number of trees have been or will be increased and new species will be brought and planted in Vancouver, but I wonder how these trees have been distributed across different areas in Vancouver? How does the median tree height compare across neighborhoods in Vancouver, and what is the status of the trees in these neighborhoods? What are the average tree diameters across neighborhoods and street sides in Vancouver? Is there any correlation between tree diameter and height range across various areas in Vancouver? We can answer these questions through the use of an interactive dashboard.

Questions of interest¶

  1. Which neighborhoods have the highest and lowest number of trees?
  2. What are the differences in median tree height between assigned and unassigned trees across various neighborhoods?
  3. How does the average tree diameter vary between different neighborhoods and street sides in Vancouver?
  4. What is the relationship between tree diameter and tree height range across different neighbourhoods?
  5. What is the distribution of root barrier across Vancouver neighbourhoods?
  6. Does the root barriers influence tree health across different neighborhoods?

Analysis¶

Reading Dataset¶

To begin, I will import the Vancouver Street Tree dataset from the University of British Columbia using the pandas library to prepare it for analysis.

In [1]:
# Importing altair and pandas libraries
import pandas as pd
import altair as alt

# Loading dataset
df = pd.read_csv('https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv',
                 parse_dates = ['date_planted'])
df.head()
Out[1]:
Unnamed: 0 std_street on_street species_name neighbourhood_name date_planted diameter street_side_name genus_name assigned ... plant_area curb tree_id common_name height_range_id on_street_block cultivar_name root_barrier latitude longitude
0 10747 W 20TH AV W 20TH AV PLATANOIDES Riley Park 2000-02-23 28.5 EVEN ACER N ... 15 Y 21421 NORWAY MAPLE 4 0 NaN N 49.252711 -123.106323
1 12573 W 18TH AV W 18TH AV CALLERYANA Arbutus-Ridge 1992-02-04 6.0 ODD PYRUS N ... 7 Y 129645 CHANTICLEER PEAR 2 2300 CHANTICLEER N 49.256350 -123.158709
2 29676 ROSS ST ROSS ST NIGRA Sunset NaT 12.0 ODD PINUS N ... 7 Y 154675 AUSTRIAN PINE 4 7800 NaN N 49.213486 -123.083254
3 8856 DOMAN ST DOMAN ST AMERICANA Killarney 1999-11-12 11.0 EVEN FRAXINUS N ... 7 Y 180803 AUTUMN APPLAUSE ASH 4 6900 AUTUMN APPLAUSE N 49.220839 -123.036721
4 21098 EAST BOULEVARD EAST BOULEVARD HIPPOCASTANUM Shaughnessy NaT 15.5 ODD AESCULUS Y ... N Y 74364 COMMON HORSECHESTNUT 4 5200 NaN N 49.238514 -123.154958

5 rows × 21 columns

Dataset description¶

The Vancouver street tree dataset provides a comprehensive listing of public trees located on boulevards throughout the City of Vancouver. The dataset includes essential attributes that describe the characteristics, locations, and classifications of these trees. Below is a detailed description of the key columns within the dataset:

Columns Description
Unnamed0 An automatically generated index column.
std_street The standard name of the street where the tree is located.
on_street The name of the street segment where the tree is planted.
species_name The scientific name of the tree species.
neighbourhood_name The name of the neighborhood where the tree is situated.
date_planted The date when the tree was planted.
diameter The diameter of the tree, usually measured at breast height (DBH), which helps in assessing the tree's size and maturity.
street_side_name The side of the street where the tree is located.
genus_name The genus to which the tree species belongs, providing a higher-level classification.
assigned Indicates whether the tree is associated with a nearby lot (Y=Yes, N=No).
civic_number The civic number of the lot associated with the tree.
plant_area The designated planting area of the tree.
curb Describes whether the tree is planted near a curb.
tree_id A unique identifier for each tree in the dataset.
common_name The common name of the tree species.
height_range_id An identifier representing the height range of the tree, which can give insights into the tree's maturity and growth.
on_street_block The block number on the street where the tree is located.
cultivar_name The name of the cultivar or variety of the tree species, if applicable.
root_barrier Indicates the presence of a root barrier.
latitude The latitude coordinate of the tree's location.
longitude The longitude coordinate of the tree's location.

The dataset is refreshed daily on weekdays to ensure it reflects the most recent updates. However, some attributes may not be updated as frequently due to prioritization and resource allocation. The coordinates were initially provided by the 2016 Geospatial Data for City of Vancouver Street Trees project. In cases where latitude and longitude values are 0, it indicates that the location data for those trees is not available.

This dataset serves as a valuable resource for understanding the distribution, diversity, and characteristics of street trees in Vancouver, enabling analyses that can inform urban forestry management and planning.

Data Summary Tables and Methods¶

In [2]:
# Numeric Description
df.describe()
Out[2]:
Unnamed: 0 date_planted diameter civic_number tree_id height_range_id on_street_block latitude longitude
count 5000.000000 2363 5000.000000 5000.000000 5000.000000 5000.00000 5000.000000 5000.000000 5000.000000
mean 14861.920400 2003-09-06 04:03:08.912399488 12.340888 2975.707600 128682.584600 2.73440 2960.227000 49.247349 -123.107128
min 2.000000 1989-10-31 00:00:00 0.000000 2.000000 36.000000 0.00000 0.000000 49.202783 -123.220560
25% 7192.750000 1997-11-06 00:00:00 4.000000 1300.500000 61321.500000 2.00000 1300.000000 49.230152 -123.144178
50% 14870.000000 2003-02-12 00:00:00 10.000000 2639.000000 130130.500000 2.00000 2600.000000 49.247981 -123.105861
75% 22366.750000 2009-11-17 00:00:00 18.000000 4123.000000 191332.000000 4.00000 4100.000000 49.263275 -123.063484
max 29992.000000 2019-05-07 00:00:00 71.000000 9113.000000 270750.000000 9.00000 9100.000000 49.293930 -123.023311
std 8680.023278 NaN 9.266600 2078.580429 75412.260406 1.56957 2086.861052 0.021251 0.049137

In the numeric data description, the average tree diameter is 12.34 inches, and the median diameter is 18 inches. The maximum diameter recorded is 71 inches, while the minimum is 0 inches. The tallest trees fall within the 90 to 100 feet range, and the shortest trees are 0 feet. The average tree height range is approximately 3.

In [3]:
# Categorical description
df.describe(include = 'object')
Out[3]:
std_street on_street species_name neighbourhood_name street_side_name genus_name assigned plant_area curb common_name cultivar_name root_barrier
count 5000 5000 5000 5000 5000 5000 5000 4950 5000 5000 2658 5000
unique 603 607 171 22 4 67 2 38 2 361 176 2
top W 13TH AV CAMBIE ST SERRULATA Renfrew-Collingwood ODD ACER N 10 Y KWANZAN FLOWERING CHERRY KWANZAN N
freq 52 49 463 384 2554 1218 4564 736 4593 383 383 4679

In the above categorical data description, the dataset contains 172 different species and the most frequent one is SERRULATA. The trees are located at 22 neighbourhoods in Vancouver, Renfrew-Collingwood has the highest number of trees and the majority of street trees are located on ODD side. The species have 67 classification (genus) and the top one is ACER. Most of the trees are not assigned to a nearby lot like 4564 trees. There are 38 types of planting area to plant the street tree and 736 trees have been planted 10 feet away from the walkside. Over 4500 trees are planted near curb and have the presence of a root barrier.

In [4]:
# Data information
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 21 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Unnamed: 0          5000 non-null   int64         
 1   std_street          5000 non-null   object        
 2   on_street           5000 non-null   object        
 3   species_name        5000 non-null   object        
 4   neighbourhood_name  5000 non-null   object        
 5   date_planted        2363 non-null   datetime64[ns]
 6   diameter            5000 non-null   float64       
 7   street_side_name    5000 non-null   object        
 8   genus_name          5000 non-null   object        
 9   assigned            5000 non-null   object        
 10  civic_number        5000 non-null   int64         
 11  plant_area          4950 non-null   object        
 12  curb                5000 non-null   object        
 13  tree_id             5000 non-null   int64         
 14  common_name         5000 non-null   object        
 15  height_range_id     5000 non-null   int64         
 16  on_street_block     5000 non-null   int64         
 17  cultivar_name       2658 non-null   object        
 18  root_barrier        5000 non-null   object        
 19  latitude            5000 non-null   float64       
 20  longitude           5000 non-null   float64       
dtypes: datetime64[ns](1), float64(3), int64(5), object(12)
memory usage: 820.4+ KB

The dataset contains 5000 samples and 21 columns. The columns date_planted and cultivar_name have many null values, while plant_area has 50 missing values. Although date_planted is one of the most useful columns in the dataset, its numerous missing values limit its utility for visualization purposes.

Columns of Interest¶

  • species_name: This column will help us understand the diversity of different tree species in Vancouver.
  • neighbourhood_name: This column allows us to analyze tree distribution across different neighborhoods.
  • genus_name: helps us understand the variety of tree types present in Vancouver's urban forest.
  • diameter: Tree diameter is an important metric for assessing tree size and age across different neighbourhoods.
  • height_range_id: indicates the height range category of trees, allowing us to analyze and compare tree sizes across various areas in Vancouver.
  • assigned: This column indicates whether a tree is linked to a nearby lot (Y=Yes) or not (N=No), helping us understand the distribution of trees with and without lot assignments.
  • street_side_name: helps us explore where the tress are located on street (ODD, EVEN, MED and BIKE MED).
  • root_barrier: provides us whether the root barrier is installed or not for the trees across Vancouver areas.

That would be more readable to replace N with No and Y with Yes in two columns like assigned and root_barrier.

In [5]:
df['assigned'] = df['assigned'].replace({'N':'No', 'Y':'Yes'})
df['root_barrier'] = df['root_barrier'].replace({'N':'No', 'Y':'Yes'})

Data Visualization¶

Which neighborhoods have the highest and lowest number of trees?¶

In [6]:
bar = alt.Chart(df).mark_bar().encode(
    x = alt.X('count()', title = 'Number of tree', axis = alt.Axis(grid = False)),
    y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', sort = '-x')
).properties(title = 'Fig 1. Tree Distribution Across Vancouver Neighborhoods')

# Put the total count on each bar
text = bar.mark_text(align = 'left', dx = 2, fontWeight = 700).encode(text = 'count()')

# Combining bar chart and text
chart1 = (bar + text)
chart1
Out[6]:

In the chart 1, Renfrew-Collingwood stands out with the highest number of trees, while Kensington-Cedar Cottage, Hastings-Sunrise, and Dunbar-Southlands also have a significant number of trees. Conversely, Strathcona has the fewest trees among Vancouver’s neighborhoods. Now that we know the number of trees across different areas in Vancouver, it would be more interesting to explore the median tree heights across the neighborhoods and their assignment status. Let's find it out.

What are the differences in median tree height between assigned and unassigned trees across various neighborhoods?¶

In [7]:
neighbourhood = df['neighbourhood_name'].unique().tolist()
chart2 = alt.Chart(df).mark_point().encode(
    x = alt.X('median(height_range_id)', title = 'Median Tree height range'),
    y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', scale = alt.Scale(domain = neighbourhood)),
    color = alt.Color('assigned', title = 'Assigned'),
    size = alt.Size('assigned'),
    tooltip = ['mean(height_range_id)', 'neighbourhood_name', 'assigned']
).properties(title = 'Fig 2. Median Tree Heights of Assigned vs. Unassigned Across Neighborhoods')
chart2
Out[7]:

From chart 2, it appears that 18 Vancouver neighborhoods have a similar median tree height range of 2 (20 to 30 feet) for trees, whether they are associated with a nearby lot or not. However, in 6 areas like Strathcona, Shaughnessy, Kitsilano, Kerrisdale, and Kensington-Cedar Cottage the median tree height range for assigned trees is higher than for unassigned trees. In contrast, in Dunbar-Southlands, the median tree height range for assigned trees is lower than for unassigned trees. This is expected, as 4564 trees in the dataset are not associated with a nearby lot. It is also important to know the size of trees and where they are located on the street side across different Vancouver areas. Let's figure out the average tree diameter on different street sides across Vancouver neighborhoods.

How does the average tree diameter vary between different neighborhoods and street sides in Vancouver?¶

In [8]:
street_side = sorted(df['street_side_name'].unique().tolist())
chart3 = alt.Chart(df, width = 150).mark_rect().encode(
    x = alt.X('street_side_name', title = 'Street side', scale = alt.Scale(domain = street_side)),
    y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', scale = alt.Scale(domain = neighbourhood)),
    color = alt.Color('mean(diameter)', title = 'Mean diameter (inch)'),
    tooltip = ['neighbourhood_name', 'street_side_name', 'mean(diameter)']
).properties(title = 'Fig 3. Average Tree Diameter by Neighborhood and Street Side in Vancouver')

chart3
Out[8]:

Chart 3 displays the average tree diameter across various Vancouver neighborhoods and street sides. Notably, all areas in Vancouver except Downtown lack trees planted on BIKE MED, where Downtown's average tree diameter is 3 inches. In Strathcona and West End, there are no trees planted in the median strip (MED). Surprisingly, the Shaughnessy area has the highest average tree diameter of 23 inches in the median strip, followed by Oakridge, Dunbar-Southlands, and Arbutus-Ridge, which also have significant average diameters in the median strip. Other areas have average tree diameters between 5 and 10 inches. Trees planted on the side of the street with odd-numbered and even-numbered addresses generally have average diameters between 10 to 15 inches, except in Downtown, where it is less than 10 inches. So far, we have found the median tree height range and average tree diameter across different Vancouver areas, assigned trees and street side. That brings me a question like what is the relationship between tree diameter and height range across Vancouver neighbourhoods? let's find it out.

What is the relationship between tree diameter and tree height range across different neighbourhoods?¶

In [9]:
# List and sort all categories
cat_class = sorted(df['neighbourhood_name'].unique().tolist())

# Binding selection
select = alt.binding_select(name = 'Neighbourhood Name: ', options = cat_class)

# Selection point with binding
menu = alt.selection_point(fields = ['neighbourhood_name'], bind = select)

# Scatter plot with selection point
chart4 = alt.Chart(df, width = 350, height = 300).mark_circle(size = 45).encode(
    x = alt.X('height_range_id', title = 'Tree height range', scale = alt.Scale(domain = [0, 10])),
    y = alt.Y('diameter', title = 'Tree diameter (inch)'),
    stroke = alt.Stroke('neighbourhood_name', legend = None),
    tooltip = ['height_range_id', 'diameter', 'species_name', 'genus_name', 'street_side_name', 'assigned'],
    opacity = alt.condition(menu, alt.value(0.95), alt.value(0))
).add_params(menu).properties(title = 'Fig 4. Relationship between tree diameter and height range across neighbourhoods')
chart4
Out[9]:

From that scatter plot, it appears there is a positive relationship between tree diameter and tree height range across different neighbourhoods.

What is the distribution of root barrier across Vancouver neighbourhoods?¶

In [10]:
title = alt.TitleParams('Fig 5. Distribution of Root Barriers Across Vancouver Neighborhoods', anchor = 'middle', dy = -5)
chart5 = alt.Chart(df).mark_bar().encode(
    x = alt.X('count()', title = 'Number of trees'),
    y = alt.Y('neighbourhood_name', sort = 'x', title = 'Vancouver neighobourhoods'),
    color = alt.Color('root_barrier', title = 'Root barrier'),
    column = alt.Column('root_barrier', title = None),
    tooltip = ['neighbourhood_name', 'root_barrier', 'count()']
).resolve_scale(y = 'independent').properties(title = title)
chart5
Out[10]:

From Figure 5, it is evident that most Vancouver neighborhoods have not installed root barriers for the majority of trees. However, Hastings, Renfrew-Collingwood, and Sunset stand out with the highest number of root barrier installations, totaling 41, 39, and 36, respectively. This observation raises questions about why root barriers are not more widely used in Vancouver. Is there a specific reason behind this, and does it have any impact on tree health?

Does the root barriers influence tree health across different neighborhoods?¶

In [11]:
chart6 = alt.Chart(df).mark_point(size = 70, filled = True).encode(
    x = alt.X('mean(diameter)', title = 'Average tree diameter (inches)', axis = alt.Axis(gridColor = 'brown', gridOpacity = 0.1)),
    y = alt.Y('neighbourhood_name', title = 'Vancouver neighbourhoods', scale = alt.Scale(domain = neighbourhood)),
    color = alt.Color('root_barrier', title = 'Root barrier', scale = alt.Scale(scheme = 'set1')),
    tooltip = ['neighbourhood_name', 'root_barrier', 'mean(diameter)']
).properties(title = 'Fig 6. Average Tree Diameter with Root Barrier Installation Across Neighborhoods')
chart6
Out[11]:

Figure 6 shows that root barrier installation negatively impacts tree growth and health. The average tree diameter in Vancouver areas with root barriers ranges from 4 to 8 inches, while in areas without root barriers, it ranges from 10 to 16 inches. This significant difference suggests that root barriers hinder tree development.

Discussion¶

In this project, we visualized the Vancouver street tree dataset provided by the University of British Columbia. We focused on several key columns from the dataset to address our project questions.

In figure 1, we can notice that one of the Vancouver area like Strathcona is the only one has less tress compared to other areas and Renfrew-Collingwood has the heighest number of trees, while Kensington-Cedar Cottage, Hastings-Sunrise, and Dunbar-Southlands also have a significant number of trees. Overall, across Vancouver neighbourhoods significante number of trees have been planted and this number will be increased by the time.

In figure 2, According to visualizing the median tree height range across Vancouver areas and tree assignment status, the median of tree heights are between 20 to 30 feet in 18 Vancouver areas whether they are assignet to a nearby lot or not. However, in 6 areas like Strathcona, Shaughnessy, Kitsilano, Kerrisdale, and Kensington-Cedar Cottage the median tree height for assigned trees is higher than for unassigned trees. In contrast, in Dunbar-Southlands, the median tree height for assigned trees is lower than for unassigned trees. We already expected this, as 4564 trees in the dataset are not associated with a nearby lot.

In figure 3, We visualized the average tree diameter across various Vancouver areas and stree side. We found that Downtown's average diameter of these trees planted on bicycle lane (BIKE MED) is 3 inches and in other Vancouver areas there are no trees planted on bicycle lane (BIKE MED). Furthermore, in Strathcona and West End there are no trees planted in the median strip (MED). Surprisingly, the Shaughnessy area has the highest average tree diameter of 23 inches in the median strip, followed by Oakridge, Dunbar-Southlands, and Arbutus-Ridge, which also have significant average diameters in the median strip. Other areas have average tree diameters between 5 and 10 inches. Trees planted on the side of the street with odd-numbered and even-numbered addresses generally have average diameters between 10 to 15 inches, except in Downtown, where it is less than 10 inches.

In figure 4, we visualized the tree diameter and height range across Vancouver neighbourhoods, generally that appears there are a significante-positive relationship between tree diameter and height range across Vancouver areas. By increasing the tree height range increases the tree diameter.

In figure 5 and 6, We have found that the root barrier in Vancouver neighbourhoods have not installed for most of the trees and a few Vancouver areas like Hastings, Renfrew-Collingwood, and Sunset stand out with the highest number of root barrier installations, totaling 41, 39, and 36, respectively. The reason of root barrier not being installed is that it negtaively impacts the tree growth and health and it also hinder the tree development.

There are still more interesting questions left for further visualization, such as what tree species and genus are the most common across Vancouver areas, where the trees are located to street sides across Vancouver neighbourhoods and how the tress are geographically distributed in Vancouver areas.

Dashboard¶

The scatter plot serves as the selector plot. When an area is selected on the scatter plot, only the data from the selected area will be displayed on the other plots. Additionally, the scatter plot features a markdown menu that allows you to focus on a specific Vancouver area; selecting an area from the menu will update the other plots to show data for that area. The two bar charts have clickable legends, and the heatmap includes radio buttons for further interaction.

In [12]:
# Plot selector
# These codes are as the same as the above scatter plot
cat_class = sorted(df['neighbourhood_name'].unique().tolist())
select = alt.binding_select(name = 'Neighbourhood Name: ', options = cat_class)
menu = alt.selection_point(fields = ['neighbourhood_name'], bind = select)
chart4 = alt.Chart(df, width = 350, height = 300).mark_circle(size = 45).encode(
    x = alt.X('height_range_id', title = 'Tree height range'),
    y = alt.Y('diameter', title = 'Tree diameter (inch)'),
    tooltip = ['height_range_id', 'diameter', 'species_name', 'genus_name', 'street_side_name', 'assigned'],
    stroke = alt.Stroke('neighbourhood_name', legend = None),
    opacity = alt.condition(menu, alt.value(1), alt.value(0))
).add_params(menu).properties(title = 'Fig 4. Relationship between tree diameter and height range across neighbourhoods')

# The second option for selection
interval = alt.selection_interval()
selector = chart4.encode(color = alt.condition(interval, 'neighbourhood_name', alt.value('white'))).add_params(interval).properties(
    title = 'Relationship between tree diameter and height range across neighbourhoods'
)
In [13]:
# Chart 2 is clickable legend
legend_bind = alt.selection_point(fields = ['assigned'], bind = 'legend')
plot1 = chart2.encode(color = alt.condition(legend_bind, 'assigned', alt.value('white'))).add_params(legend_bind)
# Linking plot 1 with selector plot
panel_1 = plot1.encode(opacity = alt.condition(menu, alt.value(1), alt.value(0))).add_params(menu).transform_filter(interval).properties(
    title = 'Median Tree Heights of Assigned vs. Unassigned Across Neighborhoods'
)

# Chart 3 is drop-down menu
sort_street_side = sorted(df['street_side_name'].unique().tolist())
radio = alt.binding_radio(name = 'Street side name: ', options = sort_street_side)
button = alt.selection_point(fields = ['street_side_name'], bind = radio)
plot2 = chart3.encode(color = alt.condition(button, 'mean(diameter)', alt.value('white'))).add_params(button)
# Linking plot 2 with selector plot
panel_2 = plot2.encode(opacity = alt.condition(menu, alt.value(1), alt.value(0))).add_params(menu).transform_filter(interval).properties(
    title = 'Average Tree Diameter by Neighborhood and Street Side in Vancouver'
)

# Chart 6 is clickable legend
leg_bind = alt.selection_point(fields = ['root_barrier'], bind = 'legend')
plot3 = chart6.encode(color = alt.condition(leg_bind, 'root_barrier', alt.value('white'))).add_params(leg_bind)
# Linking plot 3 with selector plot
panel_3 = plot3.encode(opacity = alt.condition(menu, alt.value(1), alt.value(0))).add_params(menu).transform_filter(interval).properties(
    title = 'Average Tree Diameter with Root Barrier Installation Across Neighborhoods'
)
In [14]:
((selector | panel_1).resolve_scale(color = 'independent', size = 'independent', shape = 'independent', stroke = 'independent') 
 & (panel_2 | panel_3).resolve_scale(color = 'independent', size = 'independent', shape = 'independent', stroke = 'independent'))
Out[14]: